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(57) Abstract: A method and image processing system are disclosed that extract facial feature information from an image using 
biometrics information of a face. Regions of interests such as a face, eyes, nose and mouth are defined in the image. A combination 
of disparity mapping, edge detection and filtering are then used to extract coordinates/positions of the facial features in the regions 



of interest 
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System and method for biometrics-based facial feature extraction 



FIELD OF THE INVENTION 

The present invention pertains generally to the field of detecting human faces, 
and in particular, the invention relates to a system and method for locating facial features in a 
digital image using biometrics information. 

5 

BACKGROUND OF THE INVENTION 

Systems and methods are known that analyze digital images and recognize 
human faces. Extraction of facial feature information has been used for various applications 
such as in automated/surveillance systems, monitoring systems, human interfaces to 
1 0 computers, television and video signal analysis. 

Conventional facial detection systems use methods such as facial color tone 
detection, template matching or edge detection approaches. There are, however, numerous 
shortcomings to these types of conventional systems. In general, these conventional systems 
lack robustness, e.g., due to variations in human races, facial expression and lighting 
15 conditions. 

More particularly, in systems using facial color tone detection, for example, a 
tint conversion is applied to an input digital image to determine skin-color regions. A mask 
pattern based upon the skin-color regions is used to extract characteristic facial regions. 
However, depending on light sources, the hue of the respective facial regions may change, 
20 which causes difficulty in extracting accurate information. In addition, movement, while the 
digital image is generated, may cause shadows which also causes difficulty in detecting the 

skin-color regions accurately. 

In systems using template matching, facial templates are first determined 
based upon average positions of facial features (i.e., eyes, nose and mouth) for a particular 
25 sex or race. A digital image is then matched to a template to identify sex or race. One 

shortcoming of this type of system is that expressions, e.g., a smile, may cause the wrong 
template to be used which leads to incorrect results. 

Conventional systems using edge detection are also known. Edge detection 
approaches are useful in locating the position of eyes because the eyes typically have high 
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edge density values. However, eye glasses and facial hair such as a mustache may cause 
these conventional systems to generate erroneous results. In addition, edge detection can not 
typically be used to determine the position of a nose. 

There thus exists in the art a need for improved systems and methods for 
5 extraction of facial features from digital images that provide robust performance despite 
variations in the facial features due to movement or different facial expressions. 

BRIEF SUMMARY OF THE INVENTION 

It is an object of the present invention to address the limitations of the 
1 0 conventional extraction systems discussed above. 

It is a further object of the invention to provide a facial feature extraction 
system that uses biometrics information to define regions of interest in an image and to 
accurately extract positions of facial features. 

In one aspect of the present invention, an image processing device includes a 
15 disparity detector that compares locations of like pixel information in a pair of images and 
determines disparity information and a region detector which identifies a region of interest in 
one of the images in accordance with the disparity information. The region of interest 
includes a plurality of facial features. The device also includes a first position detector 
coupled to the region detector which identifies a position of one of the facial features in 
20 accordance with the disparity information. 

In another aspect of the invention, an image processing apparatus includes a 
disparity detector that determines disparity information and an outline identifier that 
determines approximate boundaries of a face in an image based upon a comparison of a 
predetermined threshold value and the disparity information. The device also includes a nose 
25 position identifier that identifies a position of a nose in the face in accordance with the 
disparity information within a center region of the face. 

One embodiment of the invention relates to a method of determining positions 
of facial features in an image that includes the steps of calculating a disparity between a pair 
of images and determining a face region of interest (ROI) in at least one of the images. The 
30 method also includes the step of identifying a nose position within the face region of interest 
in accordance with the calculated disparity. 

Another embodiment of the invention relates to a computer-readable memory 
medium including code for processing a pair of images. The memory medium includes code 
to compare locations of like pixel information in a pair of images to determine disparity 
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information and code to identify a region of interest in one of the images in accordance with 
the disparity information. The region of interest includes a plurality of facial features. The 
memory medium also includes code to identify a position of one of the facial features in 
accordance with the disparity information. 
5 These and other embodiments and aspects of the present invention are 

exemplified in the following detailed disclosure. 

BRIEF DESCRIPTION OF DRAWINGS 

The features and advantages of the present invention can be understood by 
10 reference to the detailed description of the preferred embodiments set forth below taken with 

the drawings, in which: 

Fig. 1 is a block diagram of a facial feature extraction system in accordance 

with one aspect of the present invention. 

Fig. 2 is a block diagram of an exemplary computer system capable of 

1 5 supporting the system of Fig. 1 . 

Fig. 3 is a block diagram showing the architecture of the computer system of 

Fig. 2. 

Fig. 4 is a block diagram showing an exemplary arrangement in accordance 
with a preferred embodiment of the invention. 
20 Figs. 5A and 5B are schematic views of a subject in accordance with one 

embodiment of the invention. 

Fig. 6 is a disparity map in accordance with a preferred embodiment of the 

invention. 

Fig. 7 is a schematic diagram of an image showing various regions of interest. 
25 Fig. 8 is a flow chart of a process in accordance with one aspect of the 

invention. 

Fig. 9 is a schematic diagram of a disparity map of a nose region in 
accordance with one aspect of the invention. 

Fig. 10 is a flow chart of a process in accordance with one aspect of the 

30 invention. 

Fig. 1 1 is a diagram showing a Radon projection in accordance with one 
aspect of the invention. 
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DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Referring now to Fig. 1, a facial feature extraction system 10 is shown. 
Generally, the system 10 uses human face biometrics information (i.e., general positions of 
the nose, eyes and mouth) to define regions of interests (ROI) in an image. A combination of 
5 disparity mapping, edge detection and filtering are used to extract coordinates/positions of 
the facial features. 

In a preferred embodiment, the system 10 is implemented by computer 
readable code executed by a data processing apparatus. The code may be stored in a memory 
within the data processing apparatus or read/downloaded from a memory medium such as a 

10 CD-ROM or floppy disk. In other embodiments, hardware circuitry may be used in place of, 
or in combination with, software instructions to implement the invention. For example, the 
invention may implemented on a digital television platform using a Trimedia processor for 
processing and a television monitor for display. The invention can also be implemented on a 
computer shown in Fig. 2. 

15 As shown in Figure 2, a computer 1 0 includes a network connection 3 1 for 

interfacing to a network, such as a variable-bandwidth network or the Internet, and a 
fax/modem connection 32 for interfacing with other remote sources such as a video or digital 
camera (not shown). The computer 10 also includes a display 33 for displaying information 
(including video data) to a user, a keyboard 34 for inputting text and user commands, a 

20 mouse 35 for positioning a cursor on the display 33 and for inputting user commands, a disk 
drive 36 for reading from and writing to floppy disks installed therein, and a CD-ROM drive 
37 for accessing information stored on CD-ROM. The computer 30 may also have one or 
more peripheral devices attached thereto, such as a pair of video conference cameras for 
inputting images, or the like, and a printer 38 for outputting images, text, or the like. 

25 Figure 3 shows the internal structure of the computer 10 which includes a 

memory 40 that may include a Random Access Memory (RAM), Read-Only Memory 
(ROM) and a computer-readable medium such as a hard disk. The items stored in the 
memory 40 include an operating system 41, data 42 and applications 43. In preferred 
embodiments of the invention, the operating system 41 is a windowing operating system, 

30 such as UNIX; although the invention may be used with other operating systems as well such 
as Microsoft Windows95. Among the applications stored in memory 40 are a video coder 44, 
a video decoder 45 and a frame grabber 46. The video coder 44 encodes video data in a 
conventional manner, and the video decoder 45 decodes video data which has been coded in 
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the conventional manner. The frame grabber 46 allows single frames from a video signal 
stream to be captured and processed. 

Also included in the computer 30 are a central processing unit (CPU) 50, a 
communication interface 51, a memory interface 52, a CD-ROM drive interface 53, a video 
5 interface 54 and a bus 55 The CPU 50 comprises a microprocessor or the like for executing 
computer readable code, i.e., applications, such those noted above, out of the memory 50. 
Such applications may be stored in memory 40 (as noted above) or, alternatively, on a floppy 
disk in disk drive 36 or a CD-ROM in CD-ROM drive 37. The CPU 50 accesses the 
applications (or other data) stored on a floppy disk via the memory interface 52 and accesses 
10 the applications (or other data) stored on a CD-ROM via CD-ROM drive interface 53. 

Application execution and other tasks of the computer 30 may be initiated 
using the keyboard 34 or the mouse 35. Output results from applications running on the 
computer 30 may be displayed to a user on display 34 or, alternatively, output via network 
connection 3 1 . For example, input video data may be received through the video interface 54 
15 or the network connection 3 1 . The input video data may be decoded by the video decoder 45. 
Output video data may be coded by the video coder 44 for transmission through the video 
interface 54 or the network interface 31. The display 33 preferably comprises a display 
processor for forming video images based on decoded video data provided by the CPU 50 
over the bus 55. Output results from the various applications may be provided to the printer 
20 38. 

Returning to Fig. 1, a pair of stereo digital images comprising a left frame 60 
and a right frame 61 are input to the system 10. For example, the digital images may be 
received from two cameras 62 and 63 (shown in Fig. 4) and stored in the memory 40 for 
subsequent processing. The cameras 62 and 63 may be part of another system such as a video 

25 conferencing system or a security system. The cameras 62 and 63 are closely located to each 
other and a subject 64 is located a short distance away from the cameras 62 and 63. As shown 
in Fig. 4, the cameras 62 and 63 are 5 to 6 inches apart and the subject is 3 feet away from the 
cameras 62 and 63. It should be understood, however, that the invention is not limited to 
these distances and that the distances shown in Fig. 4 are merely exemplary. 

30 Preferably, the camera 62 takes a front view image of the subject 64 as shown 

in Fig. 5 A. The camera 63 takes an offset or side view of the subject 64 as shown in Fig. 5B. 
This allows for a comparison to be made of the left frame 60 and the right frame 61 to 
determine a disparity map. In a preferred embodiment of the invention, the left frame 60 
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(image A) is compared to a right frame 61 (image B). The reverse comparison, however, may 
also be performed. 

The digital images can be conceptualized as comprising a plurality of 
horizontal scan lines and a plurality of vertical columns that form an array pixels. The 
5 number of scan lines and columns determines the resolution of the digital image. To 

determine the disparity map, scan lines are lined up, e.g. scan line 10 of image A matches 
scan line 10 of image B. A pixel on scan line 10 of image A is then matched to its 
corresponding pixel in scan line 10 of image B. So, for example, if the 15 th pixel of scan line 
10 of image A matches the 10 th pixel of scan line 10 of image B, the disparity is calculated as 

10 follows: 15-10=5. It is noted that when the left and right cameras 62 and 63 are closely 
located, the pixels of foreground information, e.g. a human face, of an image will have a 
larger disparity than pixels of background information. The disparity calculations are 
performed by a disparity detector 1 1 shown in Fig. 1 . 

A disparity map based upon the disparity calculations may be stored in the 

15 memory 40. Each scan line (or column) of the image would have a profile consisting a t 
disparity for each pixel in that scan line (or column). Figure 6 is an example of a graphical £ 
representation of a disparity map based on a digital image of the subject 64. In this * 
embodiment, the grayscale level of each pixel indicates the magnitude of the calculated 
disparity for that pixel. The darker the grayscale level the lower the disparity. ¥ : 

20 A disparity threshold may be chosen, e.g. 10, and any disparity above the 

disparity threshold indicates the pixel is foreground information (i.e. the subject 64) while i? 
any disparity below 10 indicates the pixel is background information. The selection of the 
disparity threshold is based in part on the distances discussed above in regard to Fig. 4. For 
example, a lower disparity threshold may be used if the subject 64 is position at a greater 

25 distance from the cameras 61 and 62; or a higher disparity threshold may be used if the 
cameras 61 and 62 are further apart from each other. 

As shown in Fig. 7, a foreground 70 and a background 71 of the left frame is 
determined based on the calculated disparity map and the disparity threshold. The foreground 
70 essentially represents the head and body of the subject 64. Preferably, as shown in Fig. 7, 

30 the foreground 70 should comprise approximately 50 percent of the frame (i.e., 50 percent of 
the total number of pixels). This ensures that that face of the subject is not too large, which 
could cause portions of the face to be truncated, or very small, which could cause difficulties 
in data processing. Of course, the invention is not limited to this size of the foreground 70. 
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A face region of interest (face ROI) 72 is then determined by a face position 
determinator 12 (shown in Fig. 1). As shown in Fig. 7, the face ROI 72 is a rectangle region 
bounded by boundary lines 73, 74, 75 and 76. 

Figure 8 is a flow chart showing the steps for determining the upper boundary 

5 line 74. In step SI, the disparity threshold (T^) and a top threshold (Top) are read. A variable 
numWidth is also set to zero. Selection of the disparity threshold is discussed above. In the 
embodiment shown in Fig. 7, the value of Top is equal to the number of pixels between points 
A and B. Similar to the selection of the disparity threshold, Top is based in part on the 
distances shown in Fig. 4. For example, as discussed above, the foreground 71 is 

10 approximately 50 percent of the frame. In this configuration, the value of Top is selected to be 
approximately 20 percent of the total width of a scan line. The invention, however, is not 

limited to this T top value. 

In step S2, the profile of the top scan line is retrieved. As discussed above, this 
consists of the calculated disparity values for each pixel in that scan line. Next, if the 
15 disparity value (dm) for a particular pixel is greater than Tdm, then the value of numWidth is 
increased by one in step S4. This determination is made for each pixel in that scan line. Thus, 
if 20 pixels in one scan line have dm's greater than T dm , then numWidth would have a value 
of 20. In step S5, if the value of numWidth is greater than Top, the current scan line is 
determined to be the upper boundary line 74. Otherwise, the numWidth is made equal to zero 
20 in step S6 and the next lower (i.e., from top to bottom) scan line profile is retrieved. The steps 
are then repeated until the upper boundary line 74 is determined. 

To determine the lower boundary line 57, steps similar to those shown in Fig. 
8 are followed. The value of T«hn is the same. The value of a bottom threshold (Tbottom) (which 
is used in place of Top) is equal to the number of pixels between points D and C (shown in 
25 Fig. 7). The value of Tbonom is determined in a manner similar to that of T top . However, unlike 
step S2 shown in Fig. 8, this process starts with the bottom scan line and works up. Since the 
process works up from the bottom scan line, the values of numWidth will be greater than 
Tbotton. until a vicinity near the neck of the subject is reached. Accordingly, the boundary line 
75 is determined to be the scan line when numWidth is less than Tbonom- 
30 The left boundary line 76 and the right boundary line 77 are also determined in 

a similar manner. A left threshold (T, eft ) is equal to the number of pixels between points A 
and E shown in Fig. 7. A right threshold (T righ t) is equal to the number of pixels between 
points B and F. For determining the left boundary line 76, the process starts with a profile of 
the left most column of pixels of the frame and proceeds toward the right side of the frame. 
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For determining the right boundary line 77, the process starts with a profile of the right-most 
column of pixels of the frame and proceeds toward the left side of the frame. The left and 
right boundary lines 76 and 77 are determined to be the column when the value of numHieght 
(which is used in place of num Width) is greater than T| C ft and Tnght, respectively. 
5 The face ROI 72 is then defined by a rectangle ABCD formed by the 

intersection of the boundary lines 74-77. The shape of face ROI 72 is preferably a rectangle 
shape, however, it is not limited to this shape. For example, the face ROI may be an square, 
circle or oval. To calculate an oval, for example, first the foreground 70 may be determined 
using the disparity map. The neck region of the subject is then determined by selecting a 

10 threshold (T ncC k) and a process similar to determining the lower boundary line 75 is followed. 
The background 71 and the portion of the foreground 70 below the neck region are removed 
or made to be a value of zero. A curve (i.e., oval) fitting routine is then used to approximate 
the shape of the remainder. 

Once the face ROI 72 has been obtained the position of the nose can be 

15 determined. This is done by the nose position determinator 13 shown in Fig. 1. First, a center 
78 (shown in Fig. 7) of the face ROI 72 is determined. Since the length and width of the 
rectangle ABCD and its position in the frame is known, the center 78 can easily be obtained. 
In a similar manner, if the face ROI is a square, circle or oval, the center can easily be 
obtained. A center region 79 is defined to be approximately 10 percent of the area of the face 

20 ROI 72. As shown in Fig. 7, the center region 79 is a square, however, other shapes may be 
used. 

The nose position is located at the place with the highest disparity value (dm) 
within the center region 79. To determine areas having the highest dm, a histogram process 
may be used. Computing a histogram of an image can be performed quickly because it 

25 requires little computation. This will result with one or more areas 80 and 81 within the 
center region 79 having the highest dm, as shown in Fig. 9. The areas 80 and 81 typically 
each include a plurality of pixels. In the case of more than one area, a center 82 of the largest 
area 81 will be the nose position, the other smaller areas 80 may be noise or flat spots in the 
nose of the subject 64. In the unlikely situation of two or more areas having the same size and 

30 being the largest, an average may be taken to determine the center position. 

To determine the largest area 81, the following process is preferred. After the 
areas 80 and 81 are determined. The pixels within these areas 80 and 81 are set to a value of 
one. All the other pixels within the center region 79 are set to a value of zero. This quantizes 
the center region 79 in a binary manner. The height and/or width of each of the areas 80 and 
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81 (i.e., based on the number of pixels in the area) is determined. The area having the largest 
height and/or width is the largest area 81. The center of the largest area 81 is determined to 
be the nose position. 

After the nose position is obtained by the nose position determinator 13, left- 
5 eye and right-eye determinators 14 and 15 (shown in Fig. 1) determine the positions of eyes. 
A rough position of the eyes is first estimated by the position of the nose. As shown in Fig. 7, 
a left-eye ROI 82 and a right-eye ROI 83 are estimated to be one-half of the area above the 
nose position, respectively. 

Figure 10 shows a flow chart for determining the eye positions. In step S10, a 
10 luminance image of the left frame 60 is obtained from a luminance detector 1 8 (shown in Fig. 
1). This results in an image wherein the eyes typically have a minimum gray level (i.e., 
appear as dark areas). The inverse of this image is obtained in step SI 1 . In the inverse image, 
the eyes will have a maximum gray level (i.e., appear as bright areas). This allows the 
processing calculations to be performed in a more efficient manner. In step SI 2, a 
1 5 morphological "close" operation is performed on the inverse image. 

A morphological filter is a two step minimum-maximum process that will 
enhance some facial features such as the eyes and the mouth. For example, in the filter, first a 
minimum gray level in a 3 x 3 block is obtained. The minimum value is assigned to the 
center of the block. The maximum gray level in the 3 x 3 block is then obtained. The 
20 maximum value is then assigned to the center of the block. This reduces the dynamic range 
and increases the local contrast of the image. 

In step S13, an edge density map is formed by taking the maximum edge 
strength using the luminance from the luminance detector 16 and a chrominance from the 
chrominance detector 17. This is performed by an edge map detector 18 (shown in Fig. 1). 
25 Preferably, the edge detection results are obtained by a Sobel operation on the left frame 60 
luminance and chrominance components (see, e.g., J.S. Lim, "Two-Dimensional Signal and 
Image Processing, Prentice-Hall, 1990, pp. 498-485, incorporated herein by reference). It is 
noted, however, that other methods of edge detection may be used such as a Robert 
operation. 

30 The Sobel operation obtains gradient vectors at respective pixels in the input 

image. The direction of the gradient vector indicates the direction in which the gradient of 
brightness of the image is largest. The regions along the pixels having the largest gradient 
vector magnitudes typically form edges in the image. From this data, the edge density map 
may be generated. 
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The edge density map is obtained for the detection of edge and texture density 
around the eye areas of the subject 64. The eye areas typically have a high edge density, 
which is defined as the number of neighboring pixels that are on the edge within a given 
neighborhood. For example, in a 3 x 3 neighborhood, the edge density may range from 0 to 9. 
5 A value of 0 means that no pixels are on an edge, while a value of 9 means that all the pixels 
in that neighborhood are on an edge. Preferably, in the present invention, a 5 x 5 
neighborhoods are used. 

In step SI 4, the left-eye or right-eye-position is determined based on the result 
of the morphological "close" operation (I) and the edge density map (E). The minimum value 

10 in the left-eye ROI 82 or the right-eye ROI 83 is determined to be the respective eye position. 
The value of constant c in step S14 ranges from zero to one; increasing the value of c 
emphases edge texture and decreasing the value of c emphases the image itself. Preferably, 
the value of is approximately 0.3 for robustness of the system. 

Similar to determining a left- or right-eye ROI, a mouth ROI 84 (shown in Fig. 

15 7) is estimated to be the portion of the face ROI 72 below the determined nose position. The 
position of the mouth is preferably determined by a Gaussian weighted Radon transformation 
(see, e.g., J.S. Lim, "Two-Dimensional Signal and Image Processing, Prentice-Hall, 1990, pp. 
42-45, incorporated herein by reference). Since the horizontal coordinate of the center of the 
mouth is close to that of the nose position, a Radon transformation in a horizontal direction 

20 (i.e., a projection of the function at an angle 2 = zero) is applied on the edge map from the 
edge map determinator 1 8. A Gaussian function that is centered at the center of the mouth 
ROI 84 is used to weight the responses. 

For example, as shown in Fig. 1 1, an edge map 85 of the mouth ROI 84 is 
shown in which arrows 86 represent integration projections for the Radon transformation. A 

25 Gaussian filter 87 is applied to the Radon transformation. From a resulting Radon projection 
88, comers L and R of the mouth are obtained. The comers L and R are found by starting at 
the center of the Radon projection 88 and moving toward the left or right to determine where 
a value of the Radon projection is less than a threshold (Tmouth). Since the Radon projection 
88 drops sharply to zero at each end (i.e. beyond the comers L and R of the mouth), Tmouth 

30 may be selected to be any non-zero value, preferably in the range of 1-10. 

After the comers L and R are determined, the mouth ROI 84 is adjusted (i.e., 
reduced) accordingly. Using the adjusted mouth ROI 84, the vertical position the center of 
the mouth is searched. A Radon transformation is used in the vertical direction of the edge 
map of the adjusted mouth ROI 84. The position of maximum response is identified as the 
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vertical position of the center of the mouth. In order to search for the upper and lower lip 
positions, the vertical Radon responses are again examined. The two positions having the 
largest responses are identified as the boundary of the upper Up. Similarly, the boundaries of 
the lower lip are chosen from the center below. 
5 The system 10 then outputs the coordinates of the facial features of the subject 

64. The system 10 may also output the left frame 60 with indications of the facial features 
and various ROI's outlined or highlighted. 

The invention has numerous applications in the field of surveillance and 
security systems or in any application in which face recognition is required. The invention 
1 0 also has applications in video conferencing. 

Typically in video conferencing, a majority of the picture data in any given 
scene consists of irrelevant information, for example objects in the background. Compression 
algorithms cannot distinguish between relevant and irrelevant objects and if all of this 
information is transmitted on a low bandwidth channel, the result is a delayed "jumpy" 
15 looking video of a video conference participant. The present invention, for example, allows 
the face of a participate to be identified so that it may be transmitted at a different rate than 
the background information. This allows the movements of the face to be in synchronization 
with the audio and prevents a "jumpy" look. 

While the present invention has been described above in terms of specific 
20 embodiments, it is to be understood that the invention is not intended to be confined or 

limited to the embodiments disclosed herein. For example, the invention is not limited to any 
specific type of filtering or mathematical transformation or to any particular input image 
scale or orientation. On the contrary, the present invention is intended to cover various 
structures and modifications thereof included within the spirit and scope of the appended 
25 claims. 
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CLAIMS: 



1 . An image processing device (10) comprising: 

a disparity detector (11) arranged to compare locations of like pixel 

information in a pair of images (60,61) to determine disparity information; 

a region detector (12) which identifies a region of interest (72) in one of the 
5 images (60) in accordance with the disparity information, the region of interest (72) including 

a plurality of facial features; and 

a first position detector (13) coupled to the region detector (12) which 

identifies a position of one of the facial features in accordance with the disparity information. 

10 2. The image processing device according to Claim 1, wherein the one facial 

feature is a nose. 

3. The image processing device according to Claim 2, wherein the position (82) 
corresponds to a location (80,81) where a disparity value is largest as compared to other 

1 5 disparity values in the region of interest (79). 

4. The image processing device according to Claim 3, wherein the first position 
detector includes a sub-region detector (12) which identifies a portion (79) of the region of 
interest (72) in which the nose is located before comparing the disparity values. 

20 

5. The image processing device according to Claim 2, further comprising a 
second position detector (14) coupled to the first position detector (13) which identifies a 
location of another facial feature in accordance with the position of the nose. 

25 6. The image processing device according to Claim 5, wherein the other facial 

feature is a left or right eye, and the second position detector (14,15) identifies an 
approximate area (82,83) for the location of the left or right eye based upon the position of 
the nose. 
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7. The image processing device according to Claim 1, wherein the pair of images 

(60,61) are received from a stereo pair of cameras (62,63) in a video conference system. 

g An image processing apparatus (10), comprising: 

5 a disparity detector (1 1) arranged to compare locations of like pixel information in a pair of 

images (60,61) to determine disparity information; 

an outUne identifier (12) coupled to the disparity detector (11) which 

determines approximate boundaries (74,75,76,77) of a face (72) in one of the images (60) 

based upon a comparison of a predetermined threshold value (T mouth) and the disparity 
10 information from the disparity detector (11); and 

a nose position identifier (13) coupled to the disparity detector (1 1) and the 

outline identifier (12) which identifies a position of a nose in the face in accordance with the 

disparity information within a center region (79) of the face (72). 

15 9 . A method of determining positions of facial features in an image comprising 

the steps of: 

calculating a disparity (Fig.6) between a pair of images (60,61); 
determining (Fig.8) a face region of interest (72) (ROI) in at least one of the 



images (60); and 

identifying a nose position (82) within the face region of interest (72) in 
accordance with the calculated disparity. 



10. 



The method according to Claim 9, wherein the calculating step includes: 
identifying locations of like pixels in each of the pair of images (60,61); and 
25 calculating a difference between the locations of like pixels. 

U. The method according to Claim 9, wherein the deterniining step includes 

determining for each set of like pixels whether a disparity value between the locations falls 
above or below a predetermined threshold (Tdm), and if so identifying a scan line or pixel 
30 column as a boundary line (74,75,76,77) for the face ROI (72). 

12. The method according to Claim 9, wherein the identifying step includes: 

deterniining a center region (79) of the face ROI; 
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calculating a histogram to determine a largest disparity value within the center 

region; 

if more than one area (80,81) within the center region is determined in the step 
of using the histogram, determining which area is largest (84) as compared to the other areas 
5 (80); and 

using a center (82) of the largest area (81) as the position of the nose position. 

13. A computer-readable memory medium including code for processing a pair of 
images (60,61), the code comprising: 

10 disparity detecting code to compare locations of like pixel information in a 

pair of images to determine disparity information; 

region detecting code to identify a region of interest in one of the images in 
accordance with the disparity information, the region of interest including a plurality of facial 
features; and 

15 first position detecting code to identify a position of one of the facial features 

in accordance with the disparity information. 

14. The memory medium according to Claim 13, wherein the one facial feature is 
a nose and where the position corresponds to a location where a disparity value is largest as 

20 compared to other disparity values in the region of interest. 
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